PILCO: A Model-Based and Data-Efficient Approach to Policy Search
نویسندگان
چکیده
In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-ofthe-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.
منابع مشابه
Data-Efficient Reinforcement Learning in Continuous-State POMDPs
We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distr...
متن کاملProbabilistic Inference for Fast Learning in Control
How can we learn control tasks as fast as possible given knowledge from experience only? •autonomous learning in control from scratch using experience only (no demonstrations) •no task-specific prior assumptions • learn fast (data efficient) model-based RL •deal with model bias during long-term planning: only small data sets available for learning dynamics models 1 Key Idea and Algorithm • lear...
متن کاملSafe Policy Search with Gaussian Process Models
We propose a method to optimise the parameters of a policy which will be used to safely perform a given task in a data-efficient manner. We train a Gaussian process model to capture the system dynamics, based on the PILCO framework. Our model has useful analytic properties, which allow closed form computation of error gradients and estimating the probability of violating given state space const...
متن کاملData-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs
We present a data-efficient reinforcement learning method for continuous stateaction systems under significant observation noise. Data-efficient solutions under small noise exist, such as PILCO which learns the cartpole swing-up task in 30s. PILCO evaluates policies by planning state-trajectories using a dynamics model. However, PILCO applies policies to the observed state, therefore planning i...
متن کاملOptimizing Long-term Predictions for Model-based Policy Search
We propose a novel long-term optimization criterion to improve the robustness of model-based reinforcement learning in real-world scenarios. Learning a dynamics model to derive a solution promises much greater data-efficiency and reusability compared to model-free alternatives. In practice, however, modelbased RL suffers from various imperfections such as noisy input and output data, delays and...
متن کامل